Review: The AI Revolution in Medicine
author: Peter Lee
April 2023
The authors of this book should really be editors, since much of the text is taken verbatim from interactions with GPT-4. That’s not necessarily a criticism – a subject like this begs for specific examples. A book that describes how to use Excel, say, or some programming language, will consist mainly of text directly output from the computer.
But they go a little too far when, for example, in Chapter 9 (“Safety First”) they simply create two personas and ask GPT-4 to generate a discussion. The results are interesting and provocative, but I left feeling, well, like why did I need the authors for this? Wouldn’t I be better off just asking GPT-4 these questions myself?
That question is a good microcosm of the whole issue of using GPT-4 in medicine. Did I really need these authors to go to the trouble of making a book? That’s eerily similar to the question of do I really need a doctor to tell me that these symptoms are congenital adrenal hyperplasia (one of the GPT-4 generated cases they present).
If you want a good summary of how GPT-4 will change medicine, a more concise summary is in the one hour YouTube talk by Peter Lee: Emergence of General AI for Medicine
GPT-4 often seems better at reviewing text than creating it
Peter concludes that, despite months of trying to find counter-examples, he’s unable demonstrate conclusively that GPT-4 doesn’t “understand”.
Zak argues why traditional FDA “clinical trials” are not appropriate for GPT-4: a trial is too constrained by necessity but GPT4’s actions are open-ended.
Or you could treat it the way you do would-be medical doctors. Require it to go through “hoops” intended to weed out those without appropriate abilities: specialized training classes, tests, residencies, etc. But is there any doubt that GPT4 would pass all this?
The concern, by contrast, is that GPT doesn’t share our “values”, because it’s not human.
Undiagnosed Disease Network is a registry of unsolved medical cases.
GPT-4 Limitations
- It must be trained offline, so it can’t be really up to date. Web searches can simulate that but it’s not the same.
- No long-term memory
- Can’t do population studies that require large amounts of memory
Always ask GPT-4 to “show its work”, especially on critical decisions.
It has difficulty with problems that require “backtracking”, i.e. iterative solutions. For example, it’s terrible at Sudoku, though interestingly if you tell it to write a program using the tool SAT Solver it can solve Sudoku.
see also What AI Can’t Do
Smarter Science
Ch8, by Zak Kohane suggests ways AI can improve research
i2b2 is a Harvard-developed open source framework intended to make it easy for clinical data to be used in research. The idea is to let every patient encounter feed directly into scientific research.
Healthcare providers are supposed to write patient encounters in a SOAP format: Subjective, Objective, Assessment, and Plan.
Includes mandatory discussions of diversity and the problems that result when the data is collected from just those hospitals that participate (which tend to be Western and ethnically white or asian).
I believe obtaining diverse patient data is essential but obtaining it through deals with hospital systems is a mistake. Going to patients directly will allow for sampling across geography and socioeconomic strata while respecting patient autonomy.
The promise for basic research is even more compelling
a large language model — let’s call it Dr. One-With-Everything or Dr. OWE — that encompasses protein structure, other basic biological databases (like gene regulation and human genetic variation), preclinical studies, and the design and conduct of clinical trials. That encompassing model will likely be the central intellectual tool for biomedical research by the mid- 2030’s.
Safety First
Although this chapter says it was jointly written by the three authors, it’s more accurate to say it was edited by them, since it mostly consists of reprinted answers they give to “potentially opposite poles”, the public interest and industry interest.
Barry, a respected doctor who is also a healthcare system lobbyist, and Darlene, the founder of a patient advocacy group who also happens to be a civil rights lawyer.
But let me say, respectfully, that the “patient advocate” is not advocating for me. Rather, from my perspective she’s another off-the-shelf left-wing activist focused more on group identity than on individual choice. Without knowing a thing else about Darlene, I bet you can guess her political opinions on everything from nuclear power, GMOs, the Ukrainian War, and Trump. In other words, she hates people like me.
Neither of these perspectives really advocates for science, nor do either assume agency or intelligence on the part of individuals. How about if the AI just makes the absolute best possible cutting-edge scientific judgements and let me decide for myself?
Dr. Herman Taylor, a Harvard-trained cardiologist and head of the Cardiovascular Research Institute at Morehouse School of Medicine, is now leading a study to compare the assessments of GPT-4 to those of expert cardiologists
One of us (Zak) is Editor-in-Chief of a new medical journal, The New England Journal of Medicine AI, and
Peter argues:
The current FDA framework around Software as a Medical Device (SaMD) probably is not applicable. This is especially true for LLMs like GPT-4 that have been neither trained nor offered specifically for clinical use. And so while we believe this new breed of AI does require some form of regulation, we would urge regulators not to default automatically to regulating GPT-4 and other LLMs as SaMDs, because that would act as an instant, massive brake on their development for use in healthcare.
If we want to use an existing framework for regulating GPT-4, the one that exists today is the certification and licensure that human beings go through.
Also see
Tyler Cowen’s Recommendation > this book is the documentation, definitely recommended, especially for the skeptics.